To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library covers $13$ common text generation tasks and their corresponding $83$ datasets and further incorporates $45$ PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs. We also implement $4$ efficient training strategies and provide $4$ generation objectives for pre-training new PLMs from scratch. To be unified, we design the interfaces to support the entire research pipeline (from data loading to training and evaluation), ensuring that each step can be fulfilled in a unified way. Despite the rich functionality, it is easy to use our library, either through the friendly Python API or command line. To validate the effectiveness of our library, we conduct extensive experiments and exemplify four types of research scenarios. The project is released at the link: https://github.com/RUCAIBox/TextBox.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Retrieval-augmented Neural Machine Translation models have been successful in many translation scenarios. Different from previous works that make use of mutually similar but redundant translation memories~(TMs), we propose a new retrieval-augmented NMT to model contrastively retrieved translation memories that are holistically similar to the source sentence while individually contrastive to each other providing maximal information gains in three phases. First, in TM retrieval phase, we adopt a contrastive retrieval algorithm to avoid redundancy and uninformativeness of similar translation pieces. Second, in memory encoding stage, given a set of TMs we propose a novel Hierarchical Group Attention module to gather both local context of each TM and global context of the whole TM set. Finally, in training phase, a Multi-TM contrastive learning objective is introduced to learn salient feature of each TM with respect to target sentence. Experimental results show that our framework obtains improvements over strong baselines on the benchmark datasets.
translated by 谷歌翻译
3D object detection received increasing attention in autonomous driving recently. Objects in 3D scenes are distributed with diverse orientations. Ordinary detectors do not explicitly model the variations of rotation and reflection transformations. Consequently, large networks and extensive data augmentation are required for robust detection. Recent equivariant networks explicitly model the transformation variations by applying shared networks on multiple transformed point clouds, showing great potential in object geometry modeling. However, it is difficult to apply such networks to 3D object detection in autonomous driving due to its large computation cost and slow reasoning speed. In this work, we present TED, an efficient Transformation-Equivariant 3D Detector to overcome the computation cost and speed issues. TED first applies a sparse convolution backbone to extract multi-channel transformation-equivariant voxel features; and then aligns and aggregates these equivariant features into lightweight and compact representations for high-performance 3D object detection. On the highly competitive KITTI 3D car detection leaderboard, TED ranked 1st among all submissions with competitive efficiency.
translated by 谷歌翻译
立体声匹配是许多视觉和机器人应用程序的基本构建块。信息性和简洁的成本量表示对于高准确性和效率的立体声匹配至关重要。在本文中,我们提出了一种新颖的成本量构建方法,称为“注意串联量”(ACV),该方法从相关线索中产生了注意力权重,以抑制冗余信息并增强串联体积中与匹配相关的信息。 ACV可以无缝嵌入大多数立体声匹配网络中,所得网络可以使用更轻巧的聚合网络,同时获得更高的精度。我们进一步设计了快速版本的ACV版本以实现实时性能,名为FAST-ACV,它产生了很高的可能性差异假设,以及来自低分辨率相关线索的相应注意力权重,可显着降低计算和记忆成本,同时保持令人满意的精度。我们快速ACV的核心思想是音量注意传播(VAP),它可以自动从上采样相关量中选择准确的相关值,并将这些准确的值传播到周围环境像素具有模棱两可的相关线索。此外,我们分别基于我们的ACV和Fast-ACV设计了高度准确的网络ACVNET和实时网络快速ACVNET,该网络在几个基准上实现了最新性能(即,我们的ACVNET排名第二,第二名在Kitti 2015和场景流以及所有已发布方法中的Kitti 2012和Eth3d的第三次;我们的快速ACVNET几乎优于现场流的所有最新实时方法,Kitti 2012和2015年,与此同时,与此同时更好的概括能力)
translated by 谷歌翻译
在过去的几年中,用于计算机视觉的深度学习技术的快速发展极大地促进了医学图像细分的性能(Mediseg)。但是,最近的梅赛格出版物通常集中于主要贡献的演示(例如,网络体系结构,培训策略和损失功能),同时不知不觉地忽略了一些边缘实施细节(也称为“技巧”),导致了潜在的问题,导致了潜在的问题。不公平的实验结果比较。在本文中,我们为不同的模型实施阶段(即,预培训模型,数据预处理,数据增强,模型实施,模型推断和结果后处理)收集了一系列Mediseg技巧,并在实验中探索了有效性这些技巧在一致的基线模型上。与仅关注分割模型的优点和限制分析的纸驱动调查相比,我们的工作提供了大量的可靠实验,并且在技术上更可操作。通过对代表性2D和3D医疗图像数据集的广泛实验结果,我们明确阐明了这些技巧的效果。此外,根据调查的技巧,我们还开源了一个强大的梅德西格存储库,其每个组件都具有插件的优势。我们认为,这项里程碑的工作不仅完成了对最先进的Mediseg方法的全面和互补的调查,而且还提供了解决未来医学图像处理挑战的实用指南,包括但不限于小型数据集学习,课程不平衡学习,多模式学习和领域适应。该代码已在以下网址发布:https://github.com/hust-linyi/mediseg
translated by 谷歌翻译
安全与其他交通参与者的互动是自动驾驶的核心要求之一,尤其是在交叉点和遮挡中。大多数现有的方法都是为特定场景设计的,需要大量的人工劳动参数调整,以应用于不同情况。为了解决这个问题,我们首先提出了一个基于学习的交互点模型(IPM),该模型描述了代理与保护时间和交互优先级之间的相互作用以统一的方式。我们将提出的IPM进一步整合到一个新颖的计划框架中,通过在高度动态的环境中的全面模拟来证明其有效性和鲁棒性。
translated by 谷歌翻译
本文回顾了AIM 2022上压缩图像和视频超级分辨率的挑战。这项挑战包括两条曲目。轨道1的目标是压缩图像的超分辨率,轨迹〜2靶向压缩视频的超分辨率。在轨道1中,我们使用流行的数据集DIV2K作为培训,验证和测试集。在轨道2中,我们提出了LDV 3.0数据集,其中包含365个视频,包括LDV 2.0数据集(335个视频)和30个其他视频。在这一挑战中,有12支球队和2支球队分别提交了赛道1和赛道2的最终结果。所提出的方法和解决方案衡量了压缩图像和视频上超分辨率的最先进。提出的LDV 3.0数据集可在https://github.com/renyang-home/ldv_dataset上找到。此挑战的首页是在https://github.com/renyang-home/aim22_compresssr。
translated by 谷歌翻译
目前缺乏利用对象关系的目前有效的基于LIDAR的检测框架,这些框架自然而然地以空间和时间的方式存在。为此,我们引入了一个简单,高效且有效的两阶段检测器,称为RET3D。 RET3D的核心是利用新颖的框架内和框架间关系模块,以相应地捕获空间和时间关系。更具体地说,框内关系模块(Intrarm)将框架内对象封装到稀疏图中,从而使我们能够通过有效的消息传递来完善对象特征。另一方面,框架间关系模块(Interm)密集地将每个对象动态地连接到相应的跟踪序列中,并利用此类时间信息以通过轻量级变压器网络有效地增强其表示形式。我们使用基于中心的或基于锚的探测器实例化Intram和Interm的新颖设计,并在Waymo Open数据集(WOD)上对其进行评估。由于额外的额外开销可忽略不计,RET3D实现了最先进的性能,就1级1和2级MAPH指标而言,在车辆检测方面分别比最近的竞争对手高出5.5%和3.2%。
translated by 谷歌翻译
尽管已经提出了许多方法来增强对抗性扰动的可转移性,但这些方法是以启发式方式设计的,并且尚不清楚改善对抗性转移性的基本机制。本文总结了在统一视图中以十二个以前的可传递性提高方法共享的共同机制,即这些方法都减少了区域对抗性扰动之间的游戏理论相互作用。为此,我们专注于区域对抗扰动之间所有相互作用的攻击效用,我们首先发现并证明了对抗传递性与相互作用的攻击效用之间的负相关性。基于这一发现,我们从理论上证明并从经验上验证了十二种以前的可传递性提高方法均减少了区域对抗扰动之间的相互作用。更重要的是,我们将相互作用的减少视为增强对抗性转移性的基本原因。此外,我们设计了交互损失,以直接惩罚攻击过程中区域对抗扰动之间的相互作用。实验结果表明,相互作用损失显着提高了对抗扰动的转移性。
translated by 谷歌翻译